AITopics | virtual queue

Collaborating Authors

virtual queue

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1f01cdfd07f0ec78124627cf32d0d83c-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 00:31:37 GMT

artificial intelligence, probability, queue, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

1f01cdfd07f0ec78124627cf32d0d83c-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:13:53 GMT

customer, probability, queue, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Demonstration of effective UCB-based routing in skill-based queues on real-world data

van Kempen, Sanne, Sanders, Jaron, Sloothaak, Fiona, Wolf, Maarten G.

arXiv.org Artificial IntelligenceJun-26-2025

This paper is about optimally controlling skill-based queueing systems such as data centers, cloud computing networks, and service systems. By means of a case study using a real-world data set, we investigate the practical implementation of a recently developed reinforcement learning algorithm for optimal customer routing. Our experiments show that the algorithm efficiently learns and adapts to changing environments and outperforms static benchmark policies, indicating its potential for live implementation. We also augment the real-world applicability of this algorithm by introducing a new heuristic routing rule to reduce delays. Moreover, we show that the algorithm can optimize for multiple objectives: next to payoff maximization, secondary objectives such as server load fairness and customer waiting time reduction can be incorporated. Tuning parameters are used for balancing inherent performance trade--offs. Lastly, we investigate the sensitivity to estimation errors and parameter tuning, providing valuable insights for implementing adaptive routing algorithms in complex real-world queueing systems.

customer, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2506.20543

Country: North America (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Doubly-Bounded Queue for Constrained Online Learning: Keeping Pace with Dynamics of Both Loss and Constraint

Wang, Juncheng, Yan, Bingjie, Liu, Yituo

arXiv.org Artificial IntelligenceJan-14-2025

We consider online convex optimization with time-varying constraints and conduct performance analysis using two stringent metrics: dynamic regret with respect to the online solution benchmark, and hard constraint violation that does not allow any compensated violation over time. We propose an efficient algorithm called Constrained Online Learning with Doubly-bounded Queue (COLDQ), which introduces a novel virtual queue that is both lower and upper bounded, allowing tight control of the constraint violation without the need for the Slater condition. We prove via a new Lyapunov drift analysis that COLDQ achieves $O(T^\frac{1+V_x}{2})$ dynamic regret and $O(T^{V_g})$ hard constraint violation, where $V_x$ and $V_g$ capture the dynamics of the loss and constraint functions. For the first time, the two bounds smoothly approach to the best-known $O(T^\frac{1}{2})$ regret and $O(1)$ violation, as the dynamics of the losses and constraints diminish. For strongly convex loss functions, COLDQ matches the best-known $O(\log{T})$ static regret while maintaining the $O(T^{V_g})$ hard constraint violation. We further introduce an expert-tracking variation of COLDQ, which achieves the same performance bounds without any prior knowledge of the system dynamics. Simulation results demonstrate that COLDQ outperforms the state-of-the-art approaches.

constraint, constraint violation, violation, (16 more...)

arXiv.org Artificial Intelligence

2412.10703

Country:

Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Education > Educational Setting > Online (0.61)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)

Add feedback

Learning payoffs while routing in skill-based queues

van Kempen, Sanne, Sanders, Jaron, Sloothaak, Fiona, Wolf, Maarten G.

arXiv.org Artificial IntelligenceDec-13-2024

Motivated by applications in service systems, we consider queueing systems where each customer must be handled by a server with the right skill set. We focus on optimizing the routing of customers to servers in order to maximize the total payoff of customer--server matches. In addition, customer--server dependent payoff parameters are assumed to be unknown a priori. We construct a machine learning algorithm that adaptively learns the payoff parameters while maximizing the total payoff and prove that it achieves polylogarithmic regret. Moreover, we show that the algorithm is asymptotically optimal up to logarithmic terms by deriving a regret lower bound. The algorithm leverages the basic feasible solutions of a static linear program as the action space. The regret analysis overcomes the complex interplay between queueing and learning by analyzing the convergence of the queue length process to its stationary behavior. We also demonstrate the performance of the algorithm numerically, and have included an experiment with time-varying parameters highlighting the potential of the algorithm in non-static environments.

artificial intelligence, customer, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.10168

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (1.00)

Industry: Telecommunications (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.45)

Add feedback

Safe and Efficient Online Convex Optimization with Linear Budget Constraints and Partial Feedback

Liu, Shanqi, Liu, Xin

arXiv.org Artificial IntelligenceDec-5-2024

However, such "anytime safe projection" methods Online Convex Optimization (OCO) provides a versatile may encounter three potential challenges when dealing with framework for studying online decision-making in dynamic budget constraints: 1) they often require a substantial initial and uncertain environments [1]-[3]. Within this framework, period to explore and learn the consumption matrix; 2) determining a learner continuously adapts its decisions to minimize a the "correct" safe constraint set based on an estimated loss function or maximize a utility function while interacting consumption matrix is difficult and they are very likely to with the environment in real-time. OCO has wide-ranging be overly conservative ensures safety but degrades performance; applications, including resource allocation in network systems 3) the projection-based methods (e.g., projected online [4]-[8], load balancing in server systems [9]-[11], online gradient descent) may require heavy computation because it advertising [12], [13], and personalized healthcare [14], [15]. is equivalent to solving a constrained quadratic optimization In OCO framework, the learner chooses a decision x

budget constraint, constraint, optimization, (11 more...)

arXiv.org Artificial Intelligence

2412.03983

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.82)

Industry: Energy > Power Industry (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

One Queue Is All You Need: Resolving Head-of-Line Blocking in Large Language Model Serving

Patke, Archit, Reddy, Dhemath, Jha, Saurabh, Qiu, Haoran, Pinto, Christian, Cui, Shengkun, Narayanaswami, Chandra, Kalbarczyk, Zbigniew, Iyer, Ravishankar

arXiv.org Artificial IntelligenceJun-5-2024

$ $Large language models (LLMs) have become an increasingly important workload for cloud providers catering to both enterprise and consumer applications. LLM inference requests from these applications have end-to-end latency SLOs that must be adhered to in production settings. However, existing LLM serving systems focus on optimization objectives such as request serving throughput or request execution latency rather than the end-to-end latency SLOs. Achieving end-to-end SLOs for latency-sensitive requests is challenging due to head-of-line (HOL) blocking in the request queue, which results from bursty arrival rates and insufficient resources. To address the above challenge, we propose QLM, a multi-model queue management framework for LLM serving. QLM uses stochastic programming to orchestrate the actions of multiple LLM Serving Operations (LSOs) to reduce HOL blocking and maximize SLO attainment. Specifically, QLM uses the following LSOs: model swapping, request eviction, GPU-CPU state swapping, load balancing, and warm model start. Evaluation on heterogeneous GPU devices and models with real-world LLM serving dataset shows that QLM improves SLO attainment by 40-90% and throughput by 20-400% while maintaining or improving device utilization compared to other state-of-the-art LLM serving systems.

queue, request group, virtual queue, (14 more...)

arXiv.org Artificial Intelligence

2407.00047

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (0.34)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback